Instance Pruning Techniques
نویسندگان
چکیده
The nearest neighbor algorithm and its derivatives are often quite successful at learning a concept from a training set and providing good generalization on subsequent input vectors. However, these techniques often retain the entire training set in memory, resulting in large memory requirements and slow execution speed, as well as a sensitivity to noise. This paper provides a discussion of issues related to reducing the number of instances retained in memory while maintaining (and sometimes improving) generalization accuracy, and mentions algorithms other researchers have used to address this problem. It presents three intuitive noise-tolerant algorithms that can be used to prune instances from the training set. In experiments on 29 applications, the algorithm that achieves the highest reduction in storage also results in the highest generalization accuracy of the three methods.
منابع مشابه
Language model size reduction by pruning and clustering
Several techniques are known for reducing the size of language models, including count cutoffs [1], Weighted Difference pruning [2], Stolcke pruning [3], and clustering [4]. We compare all of these techniques and show some surprising results. For instance, at low pruning thresholds, Weighted Difference and Stolcke pruning underperform count cutoffs. We then show novel clustering techniques that...
متن کاملA Double Pruning Algorithm for Classification Ensembles
This article introduces a double pruning algorithm that can be used to reduce the storage requirements, speed-up the classification process and improve the performance of parallel ensembles. A key element in the design of the algorithm is the estimation of the class label that the ensemble assigns to a given test instance by polling only a fraction of its classifiers. Instead of applying this f...
متن کاملOptimal strategy in games with chance nodes
In this paper, games with chance nodes are analysed. The evaluation of these game trees uses the expectiminimax algorithm. We present pruning techniques involving random effects. The gamma-pruning aims at increasing the efficiency of expectiminimax (analogously to alpha-beta pruning and the classical minimax). Some interesting properties of these games are shown: for instance, a game without dr...
متن کاملA Pruning Based Approach for Scalable Entity Coreference
Entity coreference is the process to decide which identifiers (e.g., person names, locations, ontology instances, etc.) refer to the same real world entity. In the Semantic Web, entity coreference can be used to detect equivalence relationships between heterogeneous Semantic Web datasets to explicitly link coreferent ontology instances via the owl:sameAs property. Due to the large scale of Sema...
متن کاملOptimistic pruning for multiple instance learning
This paper introduces a simple evaluation function for multiple instance learning that admits an optimistic pruning strategy. We demonstrate comparable results to state of the art methods using significantly fewer computational resources.
متن کاملPruning of redundant synthesis instances based on weighted vector quantization
A new method of pruning redundant synthesis unit instances in a large-scale synthesis database was proposed based on weighted vector quantization (WVQ). WVQ takes relative importance of each instance into account when clustering the similar instances using vector quantization (VQ) technique. The proposed method was compared with two conventional pruning methods through objective and subjective ...
متن کامل